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Abstract —Robust classification becomes challenging when each 
class consists of multiple subclasses. Examples include multi-font 
optical character recognition and automated protein function 
prediction. In correlation-based nearest-neighbor classification, 
the maximin correlation approach (MCA) provides the worst-case 
optimal solution by minimizing the maximum misclassihcation 
risk through an iterative procedure. Despite the optimality, 
the original MCA has drawbacks that have limited its wide 
applicability in practice. That is, the MCA tends to be sensitive 
to outliers, cannot effectively handle nonlinearities in datasets, 
and suffers from having high computational complexity. To 
address these limitations, we propose an improved solution, 
named regularized maximin correlation approach (R-MCA). We 
first reformulate MCA as a quadratically constrained linear 
programming (QCLP) problem, incorporate regularization by in¬ 
troducing slack variables in the primal problem of the QCLP, and 
derive the corresponding Lagrangian dual. The dual formulation 
enables us to apply the kernel trick to R-MCA so that it can better 
handle nonlinearities. Our experimental results demonstrate that 
the regularization and kemelization make the proposed R-MCA 
more robust and accurate for various classification tasks than the 
original MCA. Furthermore, when the data size or dimensionality 
grows, R-MCA runs substantially faster by solving either the 
primal or dual (whichever has a smaller variable dimension) of 
the QCLP. 

Index Terms —nearest neighbor, correlation, maximin, SO CP, 
QCLP, QP, regularization, kernel trick. 


I. Introduction 

Nearest neighbor (NN) classifiers jT), (2) are non-parametric 
methods that classify an object based on its distance to the 
nearest trained class. Owing largely to their simplicity and 
reasonable performance in practical problems, they have been 
widely used for various tasks such as image retrieval 0 , object 
tracking (4), 0, location-dependent information service 0, 
and predicting stability of nucleic acid secondary structure 0 . 

The main problems that arise with NN classifiers are that (1) 
it becomes computationally intensive to find the neighbors as 
the number of training samples increases and (2) the notion of 
nearest neighbors can break down in high-dimensional spaces. 
Approaches have been proposed to reduce the computation 0 
and to adaptively determine nearest neighbors (even in high¬ 
dimensional spaces) 0. Template matching is another widely 

T. Lee and S. Yoon are with the Department of Electrical and Computer 
Engineering, Seoul National University, Seoul 08826, Korea. E-mail: sry- 
oon@snu.ac.kr 

T. Moon is with the Department of Information and Communication En¬ 
gineering, Daegu-Gyeongbuk Institute of Science and Technology (DGIST), 
Daegu 42988, Korea. 

S. Kim was with Citi Capital Advisors, New York, NY 10013, USA. 

Manuscript received March, 2016. 


used technique that pre-computes a representative vector for 
each class and uses it to locate the nearest neighbor of 
an object in. In multiple subclass classification problems, 
where each class consists of multiple subclasses, a template is 
constructed for each subclass, and then the aggregate template 
of a class is created based on the subclass templates lUTl . 

In this paper, we consider constructing the aggregate tem¬ 
plate based on the idea of the maximin correlation approach 
(MCA) (TO . For correlation-based NN classification problems, 
it is known that MCA can provide an optimal aggregate 
template in that MCA iteratively maximizes the minimum cor¬ 
relation with the templates it represents, eventually minimizing 
the maximum misclassification risk. MCA was originally 
proposed for multi-font optical character recognition lfT2l and 
has been successfully applied to automated protein function 
prediction fl3l and typography clustering lH4l . 

Despite the theoretical advantages of MCA, it has inherent 
limitations that have hindered wider applications in practice, 
such as susceptibility to noise and outliers, inability to han¬ 
dle nonlinearities in datasets, as well as high computational 
complexity. This paper proposes the regularized maximin cor¬ 
relation approach (R-MCA), a significantly improved solution 
method that overcomes these limitations of the original MCA. 

As opposed to the iterative method employed by the original 
MCA, we reformulate it as an instance of quadratically 
constrained linear programming (QCLP) m. The worst- 
case complexity of the iterative method grows quadratically 
as the number of objects increases. In contrast, the proposed 
QCLP formulation can be solved with linear complexity by the 
interior-point methods (IPMs) m when coefficient matrices 
are positive semidefinite. 

Based on the QCLP formulation, we incorporate regular¬ 
ization and additional constraints that help R-MCA to find a 
robust representative vector even when (noisy) outliers exist. 
Our formulation has some resemblance to the regulariza¬ 
tion employed by the soft-margin support vector machine 
(SVM) El We furthermore develop the Lagrangian dual 
of the regularized QCLP, which enables us to apply the 
kernel trick to effectively handle nonlinear structures possibly 
embedded in data. 

This paper also presents experimental results that confirm 
the effectiveness of R-MCA on various public data sets. 
According to these results, the proposed R-MCA successfully 
delivers the following improvements: 

• QCLP-based reformulation of MCA that enables acceler¬ 
ation, regularization and kemelization 

• Regularization to fight overfitting and outliers 
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• Kernelization for discovering nonlinear structures 

Note that R-MCA can contribute to devising a robust and 
scalable solution to not only nearest-neighbor classification but 
also a variety of other tasks based on finding group represen¬ 
tatives. For such tasks, R-MCA can provide an alternative to 
conventional aggregates, such as centroids and medoids. 

II. Maximin Correlation Approach (MCA) 

To make this paper self-contained, we briefly introduce 
the mathematical formulation of the MCA to the reader. 
Additional details can be found in El. 

Consider two non-zero vectors u, x E M m . When u and 
x are column vectors, the centered correlation is defined as 
0(u,x) = u t x/||u|| 2 11 x 11 2 * MCA involves maximizing the 
objective function that is to find the worst-case value among 
the centered correlation between a non-zero vector u and 
all of the vectors in a set X C M m . MCA can construct a 
template vector u that maximizes the minimum correlation by 
the following formulation: 

maximize minxes </>(u, x) 
subject to 11 u 112 7 ^ 0 . 

The optimization 0 is referred to as the MCA problem 
(MCAP). The original MCA iHHl assumes that all of the x^’s 
are linearly independent, 11 x^ 11 2 = 1 for all x^ E X, and 
xf Xj > 0 for all x* ,Xj E T (note that these assumptions are 
not required in the proposed R-MCA). An iterative solution 
to the MCAP was proposed in fTTTl : the template vector u 
is initialized to the centroid vector and is updated at each 
iteration to find the optimal vector u*. For fixed m (the 
dimensionality), the worst-case complexity of this iterative 
algorithm is 0(n 2 ), where n is the number of objects in X. 

III. Proposed R-MCA Methdology 

This section presents the details of the proposed R-MCA 
method. To propose more efficient solutions to the MCAP 
0 , we first formulate it as an instance of QCLP lfl5l . The 
QCLP formulation © enables us to find a solution using the 
general IPMs ES), instead of the iterative method proposed 
in uni- The QCLP formulation also allows us to define slack 
variables that lead to a regularized version 0 to effectively 
handle outliers. From the regularized version (0, we further 
derive its Lagrangian dual form 0, which reveals the structure 
suitable for applying the kernel trick. To handle nonlinearities, 
we finally kernelize the dual form 0 into the kernelized R- 
MCA formulation 0. 

Note that the original MCA (i.e., the version without reg¬ 
ularization) can also be kernelized; starting from the QCLP 
formulation 0, we derive its dual form (TITTb and the kernelized 
MCA 0}. 


A 

/ maximin 
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Fig. 1: Geometric interpretation of regularization, (a) MCA 
finds a vector whose direction minimizes the worst (i.e., 
maximum) angle between the vector and the class members, 
(b) Adding outliers (the shaded region) causes an abrupt swing 
in the traditional maximin that MCA returns. In contrast, the 
r-maximin that R-MCA finds is more robust to outliers. The 
objects on the dotted line from the origin have the minimum 
correlation with the template vector u. (c) The character ‘A’ 
represented in 10 different fonts, (d) The three aggregate 
templates of (c). 


MCA is equivalent to finding a template vector whose direc¬ 
tion minimizes the worst-case angle between the vector and 
class members. With no outliers, the maximin template that 
MCA returns represents the group reasonably. 

The existence of outliers significantly degrades the perfor¬ 
mance of MCA. For instance, Fig. 0b) shows the scenario 
in which outliers are added to the data shown in Fig. 0a). 
The maximin template returned by the original MCA swings 
abruptly towards the outliers because MCA does not recognize 
outliers. In contrast, the r-maximin template returned by R- 
MCA takes into account the outliers, yielding a template that 
represents the group more reasonably. 

As an example from real applications, Fig. 0c) shows the 
images of the character ‘A’ in ten different fonts and three 
types of templates, each of which aims at representing the 
ten images as a whole. In the centroid template, the two 
‘outlier’ (boxed) fonts are averaged out and do not appear 
well, whereas the maximin template preserves them to some 
extent. For this reason, in multi-font character recognition, 
the maximin template, which incorporates outlier information, 
results in higher accuracy than the centroid template ifTTIl . 
fl3l . In other applications, however, representing outliers may 
hurt classification accuracy. In R-MCA, we can adjust the 
sensitivity to outliers, providing an intermediate representation 
between the maximin and centroid templates (e.g., compare the 
three templates in Fig. 0c)). 


A. Geometric Interpretation 

Fig. m shows the geometric interpretation and comparison 
of MCA and the proposed R-MCA, which will be formally 
defined in the next section. As shown in Fig. 0a), solving 


B. QCLP Formulation of MCA 

A simple trick allows us to reformulate 0 as a tractable 
convex problem. After normalization of input vectors, 0 
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becomes equivalent to 

maximize mini = i v .. 5n (u T Xi) 
subject to 11 u 11 2 < 1. 

The maximizer of the above maximin problem coincides with 
the solution of the following optimization problem: 


maximize 

ieR 


subject to 

u T Xj >t, i = 1,.. 

U T U < 1. 

.,n 


The equivalent formulation © for the MCAP with a finite 
set X is simple; it involves minimizing a linear function over 
m + 1 variables, with n linear equality constraints and one 
quadratic constraint. It is an instance of QCLP, a special 
type of optimization problem that can be solved globally and 
efficiently by the IPMs liTbil . 


where v = (ui, ..., u n ) T , w = (wi, ..., w n ) T G M n , and 

z G M are the Lagrange multipliers for the three inequality 
constraints of ©. We then define the Lagrange dual function 
g as the minimum value of the Lagrangian over £, and u: 


g(v, w, z) = inf L(t, £, u, v, w, z). (5) 

t,£,u 

To calculate the infimum of the Lagrangian, we partially 
differentiate the Lagrangian as follows: 


n 


=-i+=o 


dL 
&t 

dL X n X 

777 “ — Vi Wi — 0 

o^i n n 

oL v-^ 1 v-^ 

** ~ } J Vi*i + 2zu = 0 => u = — y 


Yj Vi = l 


i= 1 


= Vi +Wi 


du 


i= 1 


ViXi. 




C. Regularization and Kernelization of MCA 

To construct a representative vector that is more robust to 
outliers (see Fig. Gib) for an example), we apply the regulariza¬ 
tion to MCA. Regularization is a popular technique to prevent 
overfitting. Bertsimas and Copenhaver recently described a 
unifying view of the connection between robustificatiorQ and 
regularization tm 

Specifically, we introduce a non-negative ‘slack’ variable & 
for each object Xi, which can help the optimization problem 
find a solution insensitive to outliers. Using the slack variables, 
we can describe the regularized version of QCLP © as 


maximize 



subject to 

i =1 

u T Xi >t — £i, i = 1, .. 

.,n 


&> 0, i = 1,... ,n 

U T U < 1 



where A is a user-specified sensitivity parameter for slack 
variables that serves as a regularization parameter; larger A 
leads to a template vector that is more sensitive to outliers. 
Subsection IIII-EI presents more details of A and its effect on 
the solution of the optimization problem. Fig. [T] presents the 
geometric interpretation. 

This formulation is similar to the optimization problem 
within the soft-margin support vector machine (SVM) fTTL 
which is a relaxation of the original SVM. Leveraged by the 
regularization, the soft-margin SVM is more robust to labeling 
error, and we expect the proposed R-MCA to have the same 
advantage over the original MCA. 

In order to facilitate understanding of ©, we derive its 
Lagrange dual m problem. First of all, we define the 
Lagrangian L: M x M n x M n x M n x M n x R g M associated 
with the problem © as 

A n 

Lit , £, u, v, w, z) = -t + - ^ & + z( 1 - u T u) 

n n i=1 (4) 

- M uT Xi “* + &>+ w i£i) 

1 immunizing a statistical problem against noise in the data 


From the above equalities, we can rewrite the Lagrange 

dual function © as gCv,w,z) = — v T CV — z where 

4z 

Cij = x^Xj. We can consider this as a function of 2 that 
is minimized when z* = Vv T CV/2 > 0. Thus, we can 
obtain a simplified representation of g(v,w,z) as -Vv T Cv 
by substituting 2 = z* into the above dual function and can 
finally formulate the dual problem of © as 

minimize v T CV 

subject to Vi > 0, Wi > 0, i = 1,..., n 

\/n = Vi + Wi, f = 1,..., n (6) 

n 

E y i = !■ 

Additionally, we can combine the top two constraints of © 
into an inequality ‘A/n > Vi > 0 for all i\ because vi and 
Wi are complements to each other. The problem now can be 
described as follows: 


minimize 

v r Cv 

subject to 

A/n y v y 0 

1 T V = 1. 


We can identify that © is a convex quadratic program (QP) 
since the gram matrix C is positive semidefinite. Hence, when 
0 has a feasible solution, by the strong duality principle m, 
the template vector of R-MCA, u* G M m (the primal solution), 
can be obtained from the solution of ©, v* G M n (the dual 
solution), as follows: 

n 

U* = C* v* x t (8) 

i= 1 

in which c* = 1/Vv* T Cv*. 

D. Kernelization 

The nonlinearities in input space can often be handled better 
in high (possibly infinite) dimensional space. The mapping to 
and the computation in such high-dimensional spaces can be 
costly, if not impossible, but when the input data are used only 
through inner products, we can use the so-called kernel trick 
to perform implicit mapping and efficient computation. 
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Inspecting the dual form of © immediately suggests that 
we can apply the kernel trick to R-MCA. Replacing the inner 
products in © with a kernel matrix K yields 


minimize 

v t Kv 

subject to 

X/n 

l T v = 1, 


where Kij = /c(x^Xj) for a Mercer kernel k. The kernel- 
ization allows the proposed R-MCA to find template vectors 
from data with nonlinearities, thus extending the applicability 
of the R-MCA. Section IIV-DI presents more details of the 
kernelization and supporting experimental results. 

Similarly as in kernelized R-MCA, in order to obtain a 
dual for MCA, we first write Lagrangian of © and its partial 
derivatives as follows: 


n 

L(t , u, v, z) = — t — ^ Vi(u T Xi —t) — z( 1 — u T u) 

i= 1 


dL 

~dt 

dL 

du 


n 

i + y ^ Vi = o 

n 

Vi^i + 2zu = 0. 

i=1 


The Lagrange dual function of MCA thus becomes 
g(v,z) = — z just as in R-MCA. By inserting 

= Vv T Cv/2 into this Lagrange dual function, we can 
formulate the dual of the original MCAP formulation © as 


minimize 

v T Cv 

subject to 

v ^ 0 
l T v = 1. 


The above is the same as except that the constraint 
X/n >: v is missing. In other words, the dual of R-MCA © 
becomes the dual of MCA TO if A > n, hence the upper 
bound constraints in © disappears. 

The kernelized version of the original MCA can also be 
derived in a similar way to Section IIII-CI by replacing the 
dot-products in the dual quadratic program (ITOb with a kernel: 


minimize 

v t Kv 

subject to 

v 0 


l T v = 1 




(ID 


where the constraint X/n >z v included in the regularized 
version © no longer appears. 


E. Analysis of A and a Comparison with MCA 

We elaborate on the characteristics of the template vectors 
obtained by R-MCA using the dual form ©. To satisfy the 
constraint l T v = 1 therein, we consider the following four 
cases: 

1) [A < 1] If the Lagrangian multipliers vfs are lower than 
1/n, the constraint = 1 cannot be satisfied. That 
is, © is not feasible if A < 1. 

2) [A = 1] Because A is the upper bound of vfs, the only 
solution to fit the constraint ffvi = 1 must be Vi = 1/n 


TABLE I. Data Used in Our Experiments 


Name n m Number of classes (description) 


KSC |20) 

5211 

176 

MNIST (Til 

10000 

784 

SONAR (13 

208 

60 

GEO (23l 

606 

30954 

3D-NUT 

272 

3 


13 (land cover types) 

10 (digits ‘0’-‘9’) 

2 (rock or mine) 

2 (ulcerative colitis patient or not) 
2 (core or shell) 


for all i. In this case, u* points to the same direction 
as the centroid of x^’s with the scaling factor (u* = 

C* ^2 V+Xi =C*]C *i/ n )• 

3) [1 < A < n\ Larger A makes vfs less constrained; 
when A becomes large, the upper bound constraints for 
Vi s become less restrictive for minimizing the objective 
v T CV. Hence, the effect of each individual example 
Xi, including the outlier, to the primal solution © can 
increase as A increases. 

4) [A > n\ If A > n, the upper bound constraints for vfs 
disappear; this follows from the fact that v is forced 
to be a probability vector by the other constraints, and 
thus it will always satisfy the upper bound constraints 
when A > n. By comparing © with the dual of the 
original MCA formulation © as below, we deduce that 
the solution of R-MCA for this case coincides with that 
of MCA. 

F. Complexity Analysis 

Recall that n corresponds to the number of objects and ra 
corresponds to the dimensionality. Since the number of itera¬ 
tions that is necessary for IPM to find a solution is practically 
constant (typically from 10 to 50) El, we can see that the 
QCLP © can be solved in 0(nm * 2 + ra 3 4 ) flops. For com¬ 
parison, the number of flops required for the iterative method 
El is either 4 mnp — mp 2 or 4 n 2 p — 2 np 2 + ran 2 , depending 
on the implementation, where p is the number of iterations. 
The empirical study in lITTl shows that p grows nearly linearly 
in n. In result, MCA has order of 0(n 2 ra) or 0(n 3 + ran 2 ) 
time complexity due to the p = O(n), while the proposed 
QCLP formulation takes min (0(nra 2 + ra 3 ),0(n 3 )). The 
computational efficiency will be demonstrated in Section HV-El 

IV. Experimental Results and Discussion 

We tested the proposed R-MCA methodology using the 
datasets listed in Table U More details about each dataset will 
be provided in the following subsections. 

For our experiments, we implemented the proposed QCLP- 
based MCA and R-MCA solvers using SeDuMi software, a 
MATLAB toolbox for optimization over symmetric cones ll24l . 
For comparison, we also prepared implementations of the 
original iterative solution to MCA as described in ECO, the 
support vector machine (SVM), and the logistic regression. 

A. Effect of Regularization on Subtype Correlation 

To see the effect of regularization, we ran R-MCA with 
different values of parameter A on a multiple-subclass dataset 
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First Principal Component 


Fig. 2: Effects of regularization and its parameter A. (a) 
The minimum correlation between aggregate templates and 
subclass templates of the KSC data, (b) The part of the MNIST 
data representing the digits ‘1’ (yellow) and ‘4’ (blue). 


and measured the variation of correlation between subclass ob¬ 
jects and the aggregate template. We used the Kennedy Space 
Center (KSC) dataset (20), which contains 5211 vectors with 
176 dimensions. Each vector represents the signal intensities of 
different wavelengths measured above 13 types of land covers 
(105-927 vectors per class). Based on the characterization 
of vegetation, these classes can be grouped into three types 
or ‘superclasses’ (upland with seven land-cover subclasses, 
wetland with five, and water type with one). 

Fig. Oa) shows the correlation of seven subclasses of 
the ‘upland’ class with the regularized maximin aggregate 
templates (r-maximin) of five different A values (1.4, 1.8, 2.2, 
2.6, 3.0). As mentioned earlier, we define A to manipulate 
the degree of regularization and can increase the best-case 
correlation value with the class members instead of sacrificing 
the worst-case correlation. To verify this effect, the curves for 
the non-regularized maximin and the centroid template are 
also presented. As expected, the curves for the r-maximin are 
placed between the centroid and the non-regularized maximin. 

B. Effect of Regularization Parameter A 

In Section IIV-A1 we discussed that the regularization pa¬ 
rameter A works as a control knob that places the result 
from using the r-maximin somewhere between those from the 
centroid template and the non-regularized maximin template. 
To visualize the effects of varying A, we utilized the MNIST 
database of handwritten digits (2H . From this database, we 
sampled 1135 and 982 images representing the digits ‘1’ and 
‘4’, respectively. Each sample is a 28 x 28 image that can be 
represented by a 784-dimensional vector. We carried out the 
PCA of these samples and took the first two principal com- 



0.3 r 
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- R-maximin (?i=3): AUC=0.936 


' 

- R-maximin (^,=3): AUC=0.941 

---SVM (rbf): AUC=0.903 

0.3 


---SVM (rbf): AUC=0.938 

- - SVM (linear): AUC=0.864 

0.2 

l 

- - SVM (linear): AUC=0.915 

-Logistic Regression: AUC=0.885 


' / 

Logistic Regression: AUC=0.817 

Maximin: AUC=0.879 

0.1 

i / 

Maximin: AUC=0.811 

-Centroid: AUC=0.836 

- 0 


- - - Centroid: AUC=0.833 


0.95 
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R-maximin (A,=3): AUC=0.91Q±0.024 
SVM (rbf): AUC=0.870±0.022 
SVM (linear): AUC=0.876±0.047 
Logistic Regression: AUC=0.88Qt0.053 
Maximin: AUC=0.857±0.051 
Centroid: AUC=0.796±0.051 


Fig. 3: Regularization improves classification performance 
using SONAR. Hyper(a, b) ROC curves from 2-fold cross 
validation, (c) AUC values (/i±cr) from 5-fold cross vaidation. 


ponents only, transforming each of them into a 2-dimensional 
point, as shown in Fig. [2b). In the figure, each of the two 
inlets magnifies the centroid and the r-maximin along with 
the corresponding images for visual inspection. 

Recall from Section IIII-EI that R-MCA eventually produces 
a centroid when A = 1. As depicted in Fig. [2b), we tested 
10 different A values of the interval [1.1, 2.0] to draw the 
trajectories of the r-maximin. The centroid in the class ‘1’ is 
located in the upper-right region, because most samples in ‘ 1’ 
class are distributed in that region. However, it is necessary 
to shift the aggregate template toward the outliers in order 
to minimize worst-case classification risk. We confirmed that 
reducing A puts the regularized maximin template near the 
centroid, and increasing A yields the r-maximin close to the 
outliers. 


C. Effect of Regularization on Classification 

To see the regularization effects in the context of classi¬ 
fication, we carried out binary classification of the SONAR 
data G2, which consist of 111 mine-reflected and 97 rock- 
reflected sonar signals of 60 dimensions each. For NN classi¬ 
fication using templates, we implemented the nearest template 
classifier that assigns an unknown vector to the class of 
its nearest (r-maximin, maximin, or centroid) template. For 
comparison, we also tested logistic regression, the linear SVM, 
and the RBF kernel SVM. 

According to the experimental results from using neural 
networks in l22l . nonlinearities exist in the distribution char¬ 
acteristics of the SONAR data. We thus preprocessed the data 
using the kernel PCA j25l with the Gaussian kernel (cr = 1). 
We then tested the five different classifiers with 2 and 5-fold 
cross-validation. The value of A was determined by performing 
the CV with 4 different A values (1.5,2,2.5,3). The soft 
margin coefficient and the sigma of RBF kernel in SVM are 
1 and 3, respectively. 
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A • B * C 



Fig. 4: Effect of kernelization (data: 3D-NUT). (a) The true 
membership (best viewed in color), (b) The membership 
retrieved by the proposed kernelized R-MCA. (c) The mem¬ 
bership assigned by the R-MCA. 

Fig. 0a) and (b) shows the receiver operating characteristic 
(ROC) curves from the first and second rounds of CV with A = 
3. The average area under the curve (AUC) values are 0.94, 
0.90, 0.86, 0.89, 0.88, and 0.84 for NN with the r-maximin 
templates, the RBF SVM, the linear SVM, logistic regression, 
NN with the maximin templates, and NN with the centroid 
templates, respectively. Fig. 0c) also presents an average and a 
standard deviation of 5 runs. The r-maximin classifier achieved 
3.3% higher AUC on average than the alternatives. 

With respect to these AUC values, the r-maximin classifi¬ 
cation produced the best result, whereas the performance of 
the original maximin classification was lower than that of the 
SVM. This result suggests that the regularization can indeed 
improve the classification accuracy for real applications with 
noise. 

D. Effect of Kernelization 

Through kernelization, we expect R-MCA to become appli¬ 
cable to classification problems that contain complex shapes 
in the input space. Fig. 0 shows the result from a proof- 
of-concept experiment using a synthetic dataset termed 3D- 
NUT, which was generated as follows: we sampled a point 
x = [xi, X 2 , xs] from a trivariate normal distribution A/"(/i, 5]), 
where /i = [0,0,0] and 5] = I. For the sake of visualization, 
x was discarded if X 2 < 0 and X 3 < 0. Otherwise, we set the 
membership of x to the ‘core’ class if ||x|| < 1 and to the 
‘shell’ class if ||x|| > 2. 

Fig. 0a) depicts the distribution of 272 points color-coded 
with binary membership (either ‘core’ or ‘shell’ class) in 
the input space. Applying the original R-MCA resulted in 
incorrect classification, as shown in Fig. 0c). In contrast, the 
kernelized R-MCA (radial basis kernel with 7 = 1 ) correctly 
separates the data points according to their membership, as 
shown in Fig. 0b). 

This experiment confirms that the kernelization works for R- 
MCA, and that we will be able to apply the kernelized version 
to other problems existing kernel-based methods ( e.g ., kernel 
PC A) can be applied to. 

E. Comparison of Execution Time with MCA 

We compare the runtime of the proposed QCLP-based 
solution and the original iterative solution ini to the maximin 
correlation approach. To this end, we carried out two types of 




Fig. 5: Comparison of execution times. Each time point 
represents the average of ten independent runs, (a) Varying 
n with fixed m = 784 (data: MNIST). (b) Varying m with 
fixed n = 606 (data: GEO). 


experiments. One is varying the number of objects n with 
the dimensionality m fixed, and the other involves varying m 
with n fixed. We measured the runtime using a Windows 7 PC 
equipped with an Intel i5-3570K CPU (3.4GHz, 6MB, 5GT/s) 
and 16GB RAM. 

Fig. 0a) shows the varying-n fixed-m case for recognizing 
the digit ‘0’ in the MNIST data (fixed m = 784). The 
time demand of the iterative solution remained the highest 
and also grew up faster than the others. As described in 
Section ED there are additional inequality constraints and 
variables in the regularized forms [0) and (0] in comparison 
with the original MCA [(0) and (ITObl. Consequently, the two 
regularized versions require longer execution times than the 
unregularized ones when n > m, as shown in Fig. 0a). 

The varying-ra fixed-n case is presented in Fig. 0b). We 
used the NCBI GEO microarray dataset lf26l (the accession 
number: GSE11223), which provides the regional variation of 
gene expression in ulcerative colitis patients l23l . The dataset 
has m = 30954 features and n = 606 samples (404 samples 
were generated by adding white Gaussian noise to the original 
202 samples). Even though m increases, the runtime of the 
dual forms [0 and (0)1) 1 does not increase noticeably, because 
n x n quadratic programming is involved in solving the dual 
forms. In contrast, the time demand of solving the primal 
forms [(0) and (0] increases as m grows. Consequently, if 
m > n, the n x n quadratic programming would take less 
time, and solving the dual forms would be better. 

Note that we can observe abrupt changes in runtime from 
both Fig. 0a) and (b) at the point where m = n. This 
originates from the design of the SeDuMi toolbox. It uses an 
approximation based on the Farkas’ lemma jT9l and finds the 
solution y G M m such that A T y = 0 if the solution x G M n 
does not exist for Ax > 0. 

In summary, the primal and dual forms should yield the 
same solution, and we can always solve either the original 
MCA or the proposed R-MCA problems faster by using the 
proposed QCLP formulation than using the original iterative 
method. When n > m, using the primal forms [(0 and ®] 
will be advantageous; otherwise using the dual forms [(0 and 
(0)1) 1 will be desirable. As the primal forms and the dual 
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forms have 0 (m) and 0 (n) variables, respectively, the same 
observations can be made from the computational complexity 
of SeDuMi, which is 0(x 2 y 2,5 + y 3 ' 5 ) l24l (x is the number 
of variables, and y is the number of independent inequalities). 

V. Conclusion 

The maximin correlation approach (MCA) was originally 
proposed in the context of multiple-subclass classification 
problems that range from the optical character recognition 
problem to the automated protein family prediction. The aggre¬ 
gate templates found by MCA work well for such applications 
since they can minimize the maximum misclassification risk 
in the correlation-based nearest-neighbor classification setup. 
Nonetheless, practical limitations such as susceptibility to 
noise, inability to handle nonlinearities consideration, and high 
time demand have hindered a wider application of MCA to real 
applications. 

To address these drawbacks, we first described how to 
formulate the MCA as an instance of the QCLP and presented 
an efficient and general solution that can replace the original 
iterative solution. Based on this QCLP-based formulation, we 
further explained how to regularize and kernelize MCA in 
order to render it more robust to outliers and applicable to 
data with nonlinearities. 

According to our experimental results, the proposed R-MCA 
successfully overcomes the limitations of the original MCA. 
Leveraged by the regularization, the proposed method out¬ 
performed the original MCA and the other alternatives tested 
in terms of classification performance. Given that the degree 
of regularization in R-MCA can be adjusted conveniently via 
a single parameter, the proposed R-MCA provides a flexible 
solution. In addition, we confirmed the computational benefit 
of the QCLP formulation and the effectiveness of kernelization 
in the (regularized) maximin correlation approach. 

We anticipate that the kernelization and regularization of 
MCA will make MCA more appealing to a wider range of 
applications that we otherwise cannot satisfactorily analyze 
with the original MCA. 
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